Database Design

I. Relational Database Design

Resources

Database Design

Database design is the process of organizing and structuring data in a logical manner to efficiently store, manage, and retrieve information in a database system. It involves defining the database structure, relationships between data elements, and constraints to ensure data integrity and consistency.

Database design is important for several reasons:

Efficiency: A well-designed database optimizes data storage and retrieval, improving system performance.
Data integrity: Proper design helps maintain accurate and consistent data across the database.
Scalability: Good design allows the database to grow and adapt to changing requirements over time.
Reduced redundancy: It minimizes data duplication, saving storage space and reducing inconsistencies.
Improved data security: Proper design facilitates the implementation of access controls and security measures.

1. Schema Design

A database schema is a formal description of the structure, organization, and relationships of data within a database. It serves as a blueprint for how data is stored and accessed. The schema defines:

Tables and their attributes (columns)
Relationships between tables
Constraints and rules for data integrity
Data types for each attribute
Indexes and other performance optimization structures

There are three levels to schema design. These three levels form a progression from abstract to concrete, with each level adding more detail and specificity.

a. Conceptual

This is the highest level of abstraction.
It provides a big-picture view of what the database will contain.
It focuses on identifying main entities, their attributes, and relationships.
Usually represented using an Entity-Relationship Diagram (ERD).
Does not include technical details.

b. Logical

This is a more detailed representation of the database structure.
It translates the conceptual model into a specific data model (e.g., relational, object-oriented).
Defines tables, columns, relationships, and constraints.
Still independent of any specific database management system (DBMS).

Look up Tables

A lookup table is a data structure used to map, reference, or convert data efficiently. In database design and computer science, lookup tables serve various purposes:

speeding up data retrieval
standardizing information
simplifying data relationships
etc.

Example: A typical example is a table that maps status codes to their descriptions. For instance, in a customer order system:

1 → Pending
2 → Shipped
3 → Delivered

This allows the system to store the numeric code in the order table, saving space and standardizing data while the lookup table provides the descriptive text.

CREATE TABLE status_codes (
    status_id INT PRIMARY KEY,
    description VARCHAR(50)
);

INSERT INTO status_codes (status_id, description)
VALUES
    (1, 'Pending'),
    (2, 'Shipped'),
    (3, 'Delivered');

status_id	description
1	Pending
2	Shipped
3	Delivered

product_id	product_name	product_status
1	Latop	1
2	Sofa	1
3	TV	3
4	Smartphone	2

By storing product_status as a numeric code and linking it to the status_codes lookup table, you achieve efficient data storage and maintain consistency.

When you need to update the description of a status, you only need to modify the status_codes table, and the products table will automatically reflect the updated description through joins.

c. Physical

This is the lowest level, describing how data is actually stored.
It includes specific details for implementation in a particular DBMS.
Defines storage structures, file organizations, indexes, access methods, and security measures.
Optimized for performance on the chosen DBMS platform.

2. Data Integrity

Data integrity refers to the overall accuracy, completeness, consistency, and reliability of data throughout its lifecycle in a database system. It's a crucial aspect of database design and management that ensures data remains valid, trustworthy, and usable over time.

There are three types of data integrity:

1. Entity Integrity

Ensures that each row in a table is a unique entity.
Primarily enforced through primary keys.
Prevents duplicate or null entries in columns designated as primary keys.
Example: In a student database, each student record must have a unique student ID.

2. Referential Integrity

Maintains consistent relationships between tables.
Enforced through foreign keys.
Ensures that values in a foreign key column correspond to existing values in the referenced table's primary key.
Prevents orphaned records (records with foreign key values that don't exist in the related table).
Example: In an order system, each order must be associated with a valid customer ID that exists in the customer table.

3. Domain Integrity

Ensures that all data in a column falls within a defined domain or set of acceptable values.
Enforced through various constraints like data types, check constraints, default values, and rules.
Maintains the accuracy and validity of data within each field.
Example: Ensuring that an 'age' column only accepts positive integers, or that a 'grade' column only accepts values A, B, C, D, or F.

CREATE TABLE students (
    student_id INT PRIMARY KEY,
    name VARCHAR(100),
    age INT CHECK (age > 0), -- Ensures age is positive
    grade CHAR(1) CHECK (grade IN ('A', 'B', 'C', 'D', 'F')) -- Ensures grade is within allowed values
);

II. Database Relationship

In a database, relationships define how tables are connected and how data is associated between them.

Parent - Child Tables

Parent and child tables are terms used in relational database design to describe the relationship between two tables connected by a foreign key.

Parent Table: A parent table, also known as a referenced table, is the table that contains the primary key which is referenced by a foreign key in another table.
Child Table: A child table, also known as a referencing table, is the table that contains the foreign key which references the primary key of the parent table.

The relationship between parent and child tables is typically a one-to-many relationship, where one record in the parent table can be associated with multiple records in the child table.

CREATE TABLE Authors (
    author_id INT PRIMARY KEY,
    author_name VARCHAR(100)
);

CREATE TABLE Books (
    book_id INT PRIMARY KEY,
    title VARCHAR(200),
    author_id INT,
    FOREIGN KEY (author_id) REFERENCES Authors(author_id)
);

Joiner Table

A joiner table (junction, bridge table) is a table in a relational database used to connect two or more tables, typically to resolve many-to-many relationships. Its primary key is usually a composite key composed of foreign keys referencing the primary keys of the tables it's connecting.

Example: We have the employees and clients tables. To express the many-to-many relationship where an employee can work with multiple clients and a client can be served by multiple employees, we create a joiner table called works_with.

+--------+-----------+-------------+
| emp_id | client_id | total_sales |
+--------+-----------+-------------+
| 107    | 400       | 55,000      |
| 101    | 400       | 267,000     |
| 101    | 404       | 22,500      |
+--------+-----------+-------------+

The joiner table works_with has a composite primary key consisting of emp_id and client_id. This combination uniquely identifies each employee-client relationship.

emp_id is a foreign key referencing the employees table.
client_id is likely a foreign key referencing the clients table.

There are three primary types of relationships:

1. One-to-One (1:1)

In a one-to-one relationship, each record in one table corresponds to exactly one record in another table, and vice versa.

Example: Table: users and userProfiles
- Explanation: Each user has one unique profile, and each profile is associated with one user. The relationship between the tables users and userProfiles is one-to-one.

Designing One-to-One Relationships

Implementation: Usually, you'd have a foreign key in one table referencing the primary key in the other. This foreign key should have a unique constraint.

-- Example Schema
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    username VARCHAR(50) UNIQUE,
    email VARCHAR(100) UNIQUE,
    password_hash VARCHAR(255)
);

CREATE TABLE userProfiles (
    profile_id INT PRIMARY KEY,
    user_id INT UNIQUE,
    full_name VARCHAR(100),
    bio TEXT,
    date_of_birth DATE,
    FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

2. One-to-Many (1:M)

In a one-to-many relationship, a record in one table can have multiple associated records in another table, but each record in the second table is associated with only one record in the first table.

Example: Tables: users and cards
- Explanation: A single user can have many cards, but each card belongs to only one user. The relationship between users and cards is one-to-many.

Designing One-to-Many Relationships

Implementation: Add a foreign key in the "many" table that references the primary key of the "one" table.

-- Example Schema
CREATE TABLE users (
  user_id INT PRIMARY KEY,
  user_name VARCHAR(100)
);

CREATE TABLE cards (
  card_id INT PRIMARY KEY,
  card_number VARCHAR(10),
  user_id INT,
  FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

3. Many-to-Many (M:M)

In a many-to-many relationship, records in one table can be associated with multiple records in another table, and vice versa. This relationship is usually implemented using a joiner (or junction/bridge) table.

Example: Tables: students and courses
- Explanation: A student can enroll in multiple courses, and a course can have many students. To manage this relationship, a joiner table, Enrollment, is used to link students and courses. The enrollment table typically contains foreign keys that reference the primary keys of the students and courses tables.

Designing Many-to-Many Relationships

Implementation: Create a joiner table with foreign keys to both main tables.

CREATE TABLE students (
  student_id INT PRIMARY KEY,
  name VARCHAR(100)
);

CREATE TABLE courses (
  course_id INT PRIMARY KEY,
  course_name VARCHAR(100)
);

CREATE TABLE enrollments (
  student_id INT,
  course_id INT,
  enrollment_date DATE,
  PRIMARY KEY (student_id, course_id), -- composite key
  FOREIGN KEY (student_id) REFERENCES Students(student_id),
  FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);

4. Self-Referencing Relationship

In a self-referencing relationship, a table has a foreign key that references its own primary key.

This allows you to model hierarchical relationships, such as an employee reporting to a manager who is also an employee.

CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    name VARCHAR(100),
    manager_id INT,
    FOREIGN KEY (manager_id) REFERENCES Employees(emp_id)
);

III. Database Normalization

Database normalization is the process of organizing the structure of a relational database to minimize redundancy and improve data integrity. The goal is to design a database that is both efficient and easy to maintain, by dividing large tables into smaller, related tables and defining relationships between them.

Normalization is typically achieved by following a series of steps called normal forms. Each normal form addresses specific types of redundancy and dependency.

1. First Normal Form (1NF)

Ensures that the table is organized into rows and columns where each column contains atomic (indivisible) values, and each column contains only one type of data.

Example: A table with a column that stores multiple phone numbers in a single field should be split so that each phone number is stored in a separate row or column.

Original table:

CustomerID	Name	Phone Numbers
1	Alice	123-456-7890, 987-654-3210
→ a 1NF compliant table:

CustomerID	Name	Phone Numbers
1	Alice	123-456-7890
1	Alice	987-654-3210

2. Second Normal Form (2NF)

Builds on 1NF by ensuring that all non-key attributes are fully functionally dependent on the primary key. This means that there should be no partial dependency of any column on the primary key.

A partial dependency occurs when a non-key attribute depends on only part of a composite primary key.

Example: In a table with a composite primary key (e.g., a table tracking student enrollments with student_id and course_id as keys), all other columns must depend on the full primary key, not just a part of it.

Original table:

Student_ID	Course_ID	Course_Name	Instructor
1	C1	Math	Smith
1	C2	Physics	Johnson
2	C1	Math	Smith
→ a 2NF compliant table: split into 2 tables `Enrollments` and `Courses`

Student_ID	Course_ID
1	C1
1	C2
2	C1

Course_ID	Course_Name	Instructor
C1	Math	Smith
C2	Physics	Johnson
C1	Math	Smith

3. Third Normal Form (3NF)

Further refines 2NF by ensuring that all non-key attributes are not only dependent on the primary key but are also non-transitively dependent. This means no non-key attribute should depend on another non-key attribute.

A transitive dependency occurs when a non-key attribute depends on another non-key attribute, rather than depending directly on the primary key.

Example: Consider a table OrderDetails with the following attributes:

OrderID	ProductID	ProductName	Category	CategoryManager
1001	P101	Laptop	Electronics	John Smith
1002	P102	Smartphone	Electronics	John Smith
1003	P201	Desk Chair	Furniture	Mary Johnson
1004	P101	Laptop	Electronics	John Smith

In this table:

OrderID and ProductID form the composite primary key.
ProductName depends on ProductID.
Category depends on ProductID.
CategoryManager depends on Category, not directly on the primary key.

This violates 3NF because CategoryManager is transitively dependent on the primary key through Category.

To comply with 3NF, we should split this into two tables: OrderDetails and Categories.

OrderID	ProductID	ProductName	Category
1001	P101	Laptop	Electronics
1002	P102	Smartphone	Electronics
1003	P201	Desk Chair	Furniture
1004	P101	Laptop	Electronics

Category	CategoryManager
Electronics	John Smith
Furniture	Mary Johnson

Now:

OrderDetails contains only attributes that are fully dependent on the primary key.
Categories separates the CategoryManager, which depends on Category, into its own table.

I. Relational Database Design​

1. Schema Design​

a. Conceptual​

b. Logical​

Look up Tables​

c. Physical​

2. Data Integrity​

1. Entity Integrity​

2. Referential Integrity​

3. Domain Integrity​

II. Database Relationship​

Parent - Child Tables​

Joiner Table​

1. One-to-One (1:1)​

Designing One-to-One Relationships​

2. One-to-Many (1:M)​

Designing One-to-Many Relationships​

3. Many-to-Many (M:M)​

Designing Many-to-Many Relationships​

4. Self-Referencing Relationship​

III. Database Normalization​

1. First Normal Form (1NF)​

2. Second Normal Form (2NF)​

3. Third Normal Form (3NF)​